Approximation Algorithms for k-Anonymity

نویسندگان

  • Gagan Aggarwal
  • Tomas Feder
  • Krishnaram Kenthapadi
  • Rajeev Motwani
  • Rina Panigrahy
چکیده

We consider the problem of releasing a table containing personal records, while ensuring individual privacy and maintaining data integrity to the extent possible. One of the techniques proposed in the literature is k-anonymization. A release is considered k-anonymous if the information corresponding to any individual in the release cannot be distinguished from that of at least k − 1 other individuals whose information also appears in the release. In order to achieve k-anonymization, some of the entries of the table are either suppressed or generalized (e.g. an Age value of 23 could be changed to the Age range 20-25). The goal is to lose as little information as possible while ensuring that the release is k-anonymous. This optimization problem is referred to as the k-Anonymity problem. We show that the k-Anonymity problem is NP-hard even when the attribute values are ternary and we are allowed only to suppress entries. On the positive side, we provide an O(k)-approximation algorithm for the problem. We also give improved positive results for the interesting cases with specific values of k — in particular, we give a 1.5-approximation algorithm for the special case of 2-Anonymity, and a 2-approximation algorithm for 3-Anonymity. ∗. A preliminary version of this paper appeared in the Proceedings of the 10th International Conference on Database Theory (ICDT’05) (Aggarwal, Fèder, Kenthapadi, Motwani, Panigrahy, Thomas, and Zhu, 2005). Manuscript received 6 Aug 2005, accepted 8 Nov 2005, published 20 Nov 2005 k-ANONYMITY AGGARWAL, FEDER, KENTHAPADI, MOTWANI, PANIGRAHY, THOMAS, ZHU

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Theory of Privacy and Anonymity

3 k-Anonymity with hierarchy-based generalization 12 3.1 Problem complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.2 Algorithms for k-anonymity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.2.1 Samarati’s Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.2.2 Incognito . . . . . ...

متن کامل

Enforcement of k-anonymity Through Generalization and Suppression

While limited data set is shown to not guarantee anonymity, k-anonymity is proposed by Dr. Latanya Sweeney of MIT as an alternative way to release public information while ensuring both data privacy and data integrity [1, 2, 3]. k-anonymity is provided by using generalization and suppression techniques. Generalization involves replacing a value with a less specific but semantically consistent v...

متن کامل

Adaptive Anonymity via b-Matching

The adaptive anonymity problem is formalized where each individual shares their data along with an integer value to indicate their personal level of desired privacy. This problem leads to a generalization of k-anonymity to the b-matching setting. Novel algorithms and theory are provided to implement this type of anonymity. The relaxation achieves better utility, admits theoretical privacy guara...

متن کامل

Streaming Algorithms for k-Center Clustering with Outliers and with Anonymity

Clustering is a common problem in the analysis of large data sets. Streaming algorithms, which make a single pass over the data set using small working memory and produce a clustering comparable in cost to the optimal offline solution, are especially useful. We develop the first streaming algorithms achieving a constant-factor approximation to the cluster radius for two variations of the k-cent...

متن کامل

Utility - Preserving k - Anonymity

As technology advances and more and more person-specific data like health information becomes publicly available, much attention is being given to confidentiality and privacy protection. On one hand, increased availability of information can lead to advantageous knowledge discovery; on the other hand, this information belongs to individuals and their identities must not be disclosed without con...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005